SemanticScuttle - klotz.me » klotz: prompt engineering+large language models

klotz: prompt engineering* + large language models*

OpenAI spills technical details about how its AI coding agent works

Unusually detailed post explains how OpenAI handles the Codex agent loop. The article dives into the technical aspects of OpenAI's Codex CLI coding agent, including the agent loop, prompt construction, caching, and context window management.

The article details how their Codex CLI coding agent functions. OpenAI engineer Michael Bolin explains the "agent loop" – the process by which the AI receives user input, generates code, runs tests, and iterates with human supervision.

* **Agent Loop Mechanics:** The agent builds prompts with prioritized components (system, developer, user, assistant) and sends them to OpenAI’s Responses API.
* **Prompt Management:** The system handles growing prompt lengths (quadratic growth) through caching, compaction, and a stateless API design (allowing for "Zero Data Retention"). Cache misses can significantly impact performance.
* **Context Window:** Codex automatically compacts conversations to stay within the AI model's context window.
* **Open Source Focus:** OpenAI open-sources the CLI client for Codex, unlike ChatGPT, suggesting a different approach to development and transparency for coding tools.
* **Challenges Acknowledged:** The article doesn't shy away from the engineering challenges, like performance issues and bugs encountered during development.
* **Future Coverage:** Bolin plans to release further posts detailing the CLI’s architecture, tool implementation, and sandboxing model.

2026-01-27 Tags: openai, codex, coding, agent, llm, gpt-5.2, agentic loop, prompt engineering, michael bolin by klotz

Prompt Repetition Improves Non-Reasoning LLMs

Repeating the input prompt improves performance for popular LLMs (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency, when not using reasoning.

2026-01-18 Tags: large language model, prompt engineering, prompt repetition, performance, google by klotz

Prompt Engineering Guide

Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.

2026-01-02 Tags: prompt engineering, llm, agents, rag, language-model, github, dair-ai by klotz

The State of MCP in 2025

A comprehensive overview of the current state of Multi-Concept Prompting (MCP), including advancements, challenges, and future directions.

2025-12-08 Tags: mcp, multi-concept prompting, ai, llm, large language models, prompt engineering, ai agents, context windows, retrieval augmented generation by klotz

LLM EvalKit

LLM EvalKit is a streamlined framework that helps developers design, test, and refine prompt‑engineering pipelines for Large Language Models (LLMs). It encompasses prompt management, dataset handling, evaluation, and automated optimization, all wrapped in a Streamlit web UI.

Key capabilities:

| Stage | What it does | Typical workflow |
|-------|-------------|------------------|
| **Prompt Management** | Create, edit, version, and test prompts (name, text, model, system instructions). | Define a prompt, load/edit existing ones, run quick generation tests, and maintain version history. |
| **Dataset Creation** | Organize data for evaluation. Loads CSV, JSON, JSONL files into GCS buckets. | Create dataset folders, upload files, preview items. |
| **Evaluation** | Run model‑based or human‑in‑the‑loop metrics; compare outcomes across prompt versions. | Choose prompt + dataset, generate responses, score with metrics like “question‑answering‑quality”, save baseline results to a leaderboard. |
| **Optimization** | Leveraging Vertex AI’s prompt‑optimization job to automatically search for better prompts. | Configure job (model, dataset, prompt), launch, and monitor training in Vertex AI console. |
| **Results & Records** | Visualize optimization outcomes, compare versions, and maintain a record of performance over time. | View leaderboard, select best optimized prompt, paste new instructions, re‑evaluate, and track progress. |

**Getting Started**

1. Clone the repo, set up a virtual environment, install dependencies, and run `streamlit run index.py`.
2. Configure `src/.env` with `BUCKET_NAME` and `PROJECT_ID`.
3. Use the UI to create/edit prompts, datasets, and launch evaluations/optimizations as described in the tutorial steps.

**Token Use‑Case**

- **Prompt**: “Problem: {{query}}nImage: {{image}} @@@image/jpegnAnswer: {{target}}”
- **Example input JSON**: query, choices, image URL, target answer.
- **Model**: `gemini-2.0-flash-001`.

**License** – Apache 2.0.

2025-10-23 Tags: llm, evaluation, prompt engineering, optimization, datasets, google, gcp by klotz

Prompt Engineering for Time-Series Analysis with Large Language Models

This article explores how prompt engineering can be used to improve time-series analysis with Large Language Models (LLMs), covering core strategies, preprocessing, anomaly detection, and feature engineering. It provides practical prompts and examples for various tasks.

2025-10-16 Tags: llm, prompt engineering, time series, forecasting, anomaly detection, feature engineering, data science, machine learning, production engineering, observability by klotz

Effective context engineering for AI agents

This article explores strategies for effectively curating and managing the context that powers AI agents, discussing the shift from prompt engineering to context engineering and techniques for optimizing context usage in LLMs.

2025-10-05 Tags: llm, agents, context engineering, prompt engineering, attention by klotz

The LLM Function Design Pattern: A Structured Approach to AI-Powered Software Development

This article introduces the LLM Function Design Pattern, a structured approach to building AI-powered software. It addresses the challenges of integrating Large Language Models (LLMs) into applications by outlining a pattern that promotes modularity, testability, and maintainability. The pattern involves defining clear functions with specific inputs and outputs, and then leveraging LLMs to implement the core logic within those functions.

2025-10-03 Tags: llm, prompt engineering, patrick chan by klotz

5 tips for writing better custom instructions for Copilot

This guide offers five essential tips for writing effective GitHub Copilot custom instructions, covering project overview, tech stack, coding guidelines, structure, and resources, to help developers get better code suggestions.

2025-09-19 Tags: github, copilot, llm, agent, coding, prompt engineering by klotz

Tool Masking: The Layer MCP Forgot

This article discusses the concept of 'tool masking' as a way to optimize the interaction between LLMs and APIs, arguing that simply exposing all API functionality (as done by MCP) is inefficient and degrades performance. It proposes shaping the tool surface to match the specific use case, improving accuracy, cost, and latency.

2025-09-06 Tags: llm, agents, mcp, prompt engineering, tool masking by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: prompt engineering* + large language models*

Linked Tags

Related Tags